Getting and analyzing remotely sensed open data#
2023.05.23
Our last tutorial will show you how to navigate through a STAC catalog, search within a cloud-hosted data set, and acquire some popular open satellite data sets.
Goals#
Use Intake-STAC to browse a STAC catalog and load the data
Make a simple animation showing the time lapse of the Taoyuan international airport viewed from space (Self-guided)
Installation#
We will use Intake, an interface for loading data into the scientific Python ecosystem. Here we install intake-stac, an intake-based library for loading STAC catalogs. (Intake itself will be installed along with this library because of the dependency.) In addtion to that, we will need hvplot again to visualize our data.
!pip install intake-stac hvplot
Goal 1 procedure#
STAC#
SpatioTemporal Asset Catalogs (STAC) are a set of specification for describing spatialtemporal data sets, such as satellite images and spatial vector data. Satellite data sets described by the language of STAC can be more accessible because users will not spend time learning how to search and access data again and again (which is typical for the current commercial data sets). The first version of STAC was released in May 2021 and became popular within major satellite data providers. You can also check out this blogpost by the Planet Inc. for more details.
The full specification of STAC as well as numerous tutorials are available on the STAC website. Also, on the STAC intex website you can find a collection of STAC catalogs to explore.
Intake#
Intake is a Python-based tool for interacting and accessing local and cloud data. With the STAC plugin installed (Intake-STAC), we can use Intake to navigate through a STAC catalog, perform basic search, and load the desired data into the memory for further analysis. Intake also provides Jupyter-based GUI for querying/describing data using proper visualization.
The Planet disaster data#
In this tutorial, we will use the Planet disaster data, a small data set that Planet Inc. prepared for showing how STAC works with the data analysis workflow. We will be looking at Hurricane Harvey, one of the most devastating natural disasters in the U.S. Here we will get the data acquired by the PlanetScope satellite constellation over Houston, TX, U.S. on August 31, 2017, a few days after the hurricane brought the record-breaking rainfall. The base URL (i.e., landing page) of this STAC catalog is at https://www.planet.com/data/stac/catalog.json, and you can try to open it on a web browser to see what will happen.
STAC browser#
In fact, there is a better way to visually check this STAC catalog than opening it as a plain text JSON file on the web browser. Try the STAC brower!

Workflow#
First, let’s import the necessary modules:
import intake
import hvplot.xarray
Now we can use Intake to open the STAC cataog:
catalog_url = "https://www.planet.com/data/stac/" # "https://www.planet.com/data/stac/catalog.json" will also work.
catalog = intake.open_stac_catalog(catalog_url)
catalog
planet:
args:
stac_obj: https://www.planet.com/data/stac/
description: ''
driver: intake_stac.catalog.StacCatalog
metadata:
description: This catalog serves as the main root STAC Catalog for Planet Labs.
It links to 4 small static catalogs of open data, including a small set of Planet
Disaster Data, some CC-BY SkySat collects and the complete Spacenet 7 images
and labels.
id: planet
stac_extensions:
- https://stac-extensions.github.io/stats/v0.2.0/schema.json
stac_version: 1.0.0
stats:catalogs:
count: 9
versions:
1.0.0: 9
stats:collections:
count: 7
versions:
1.0.0: 7
stats:items:
assets:
application/geo+json: 1423
application/json: 26
image/jpeg: 1
image/png: 13
image/tiff: 2
image/tiff; application=geotiff: 2253
image/tiff; application=geotiff; profile=cloud-optimized: 425
text/xml: 3
count: 3708
extensions:
https://stac-extensions.github.io/eo/v1.0.0/schema.json: 396
https://stac-extensions.github.io/label/v1.0.0/schema.json: 1423
https://stac-extensions.github.io/projection/v1.0.0/schema.json: 385
https://stac-extensions.github.io/raster/v1.0.0/schema.json: 3
https://stac-extensions.github.io/raster/v1.1.0/schema.json: 371
https://stac-extensions.github.io/view/v1.0.0/schema.json: 31
versions:
1.0.0: 3708
title: Planet Root STAC Catalog
type: Catalog
We see lots of information here, which you can also check on the STAC browser. To see what collections this catalog contains, use this method:
list(catalog)
['planet-stac-skysat',
'planet-disaster-data',
'sn7',
'planet/fusion/14N/29E-188N',
'education']
We want to access the Planet disaster data, so we go into one layer deep and get the corresponding STAC collection:
collection = catalog['planet-disaster-data']
collection
planet-disaster-data:
args:
stac_obj: https://www.planet.com/data/stac/disasters/collection.json
description: '[Planet Disaster Data](https://www.planet.com/disasterdata/) makes
imagery available directly to the public, volunteers, humanitarian organizations,
and other coordinating bodies in support of the International Charter for Space
and Major Disasters. Data is released for individual disaster events, providing
a 30 day window pre- and post-disaster. Imagery is provided under Creative Commons
licenses, free of charge, with either CC-BY-SA or CC-BY-NC. This STAC catalog
provides fully public access of a very small subset of the imagery, as a proof
of concept.'
driver: intake_stac.catalog.StacCollection
metadata:
catalog_dir: ''
All the metadata about this asset are available at:
collection.metadata
{'type': 'Collection',
'id': 'planet-disaster-data',
'stac_version': '1.0.0',
'description': '[Planet Disaster Data](https://www.planet.com/disasterdata/) makes imagery available directly to the public, volunteers, humanitarian organizations, and other coordinating bodies in support of the International Charter for Space and Major Disasters. Data is released for individual disaster events, providing a 30 day window pre- and post-disaster. Imagery is provided under Creative Commons licenses, free of charge, with either CC-BY-SA or CC-BY-NC. This STAC catalog provides fully public access of a very small subset of the imagery, as a proof of concept.',
'stac_extensions': [],
'title': 'Planet Disaster Data',
'keywords': ['disaster', 'open'],
'summaries': {'platform': ['SS02', 'SSC1', '101c'],
'constellation': ['skysat', 'planetscope'],
'eo:cloud_cover': {'minimum': 0, 'maximum': 24},
'eo:gsd': {'minimum': 0.9, 'maximum': 3.7},
'view:off_nadir': {'minimum': 0.2, 'maximum': 27.3},
'view:sun_azimuth': {'minimum': 122, 'maximum': 231.9},
'view:sun_elevation': {'minimum': 56.3, 'maximum': 65.1}},
'providers': [{'name': 'Planet',
'description': 'Contact Planet at https://www.planet.com/contact-sales/',
'roles': ['producer', 'processor'],
'url': 'https://www.planet.com'},
{'name': 'Planet Disaster Team <disaster-team@planet.com>',
'description': 'The Planet Disaster Data Team has released this data as CC-BY-SA-4.0 to help disaster response',
'roles': ['licensor'],
'url': 'https://www.planet.com/disasterdata/'},
{'name': 'Google Cloud Platform',
'description': 'Hosting is done on a GCP storage bucket owned by the Planet Disaster Data team',
'roles': ['host'],
'url': 'https://storage.googleapis.com/pdd-stac/'}],
'extent': {'spatial': {'bbox': [[-96, 28, -93, 31]]},
'temporal': {'interval': [['2017-08-28T10:00:00Z', None]]}},
'license': 'CC-BY-SA-4.0'}
We can repeat the use of list to get a few layers deeper, and eventually we will get a STAC item that points to the desired satellite image:
item = collection['hurricane-harvey']['hurricane-harvey-0831']['Houston-East-20170831-103f-100d-0f4f-RGB']
item
mosaic:
args:
chunks: {}
urlpath: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.tif
description: 3 Band RGB Mosaic
direct_access: allow
driver: intake_xarray.raster.RasterIOSource
metadata:
href: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.tif
plots:
geotiff:
cmap: viridis
data_aspect: 1
dynamic: true
frame_width: 500
kind: image
rasterize: true
x: x
y: y
roles:
- data
title: 3 Band RGB Mosaic
type: image/tiff; application=geotiff; profile=cloud-optimized
thumbnail:
args:
chunks: {}
urlpath: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png
description: Thumbnail
direct_access: allow
driver: intake_xarray.image.ImageSource
metadata:
href: https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png
plots:
thumbnail:
bands: channel
data_aspect: 1
flip_yaxis: true
kind: rgb
x: x
xaxis: false
y: y
yaxis: false
roles:
- thumbnail
title: Thumbnail
type: image/png
You can see this asset has two different scenes, and one of them is the thumbnail, also known as the browse image:
thumbnail = item['thumbnail']
Now you can try to enter thumbnail or thumbnail.describe() for additional information about this image! To get the actual image, you can either download it from the url,
thumbnail.urlpath
'https://storage.googleapis.com/pdd-stac/disasters/hurricane-harvey/0831/Houston-East-20170831-103f-100d-0f4f-3-band.png'
or use the to_dask() method to lazily stream it as a Dask array.
da_thumbnail = thumbnail.to_dask()
da_thumbnail
<xarray.DataArray (y: 552, x: 549, channel: 3)> dask.array<xarray-<this-array>, shape=(552, 549, 3), dtype=uint8, chunksize=(552, 549, 3), chunktype=numpy.ndarray> Coordinates: * y (y) int64 0 1 2 3 4 5 6 7 8 ... 543 544 545 546 547 548 549 550 551 * x (x) int64 0 1 2 3 4 5 6 7 8 ... 540 541 542 543 544 545 546 547 548 * channel (channel) int64 0 1 2
da_thumbnail.hvplot.image(x='x', y='y', data_aspect=1)
Now let’s explore the high-resolution mosaic data using the visualization steps we saw in the last tutorial!
mosaic = item['mosaic']
da_mosaic = mosaic.to_dask()
da_mosaic
<xarray.DataArray (band: 3, y: 22094, x: 21984)>
dask.array<open_rasterio-501c37a1d96e315c20253204d9aec1ce<this-array>, shape=(3, 22094, 21984), dtype=uint8, chunksize=(3, 22094, 21984), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1 2 3
* y (y) float64 3.524e+06 3.524e+06 3.524e+06 ... 3.447e+06 3.447e+06
* x (x) float64 -1.066e+07 -1.066e+07 ... -1.058e+07 -1.058e+07
Attributes: (12/13)
transform: (3.4638558402435815, 0.0, -10657435.586420376, 0...
crs: +init=epsg:3857
res: (3.4638558402435815, 3.4638558402435815)
is_tiled: 1
nodatavals: (nan, nan, nan)
scales: (1.0, 1.0, 1.0)
... ...
AREA_OR_POINT: Area
TIFFTAG_DATETIME: 2017:09:01 15:10:49
TIFFTAG_RESOLUTIONUNIT: 2 (pixels/inch)
TIFFTAG_SOFTWARE: Adobe Photoshop CC 2015.5 (Macintosh)
TIFFTAG_XRESOLUTION: 72
TIFFTAG_YRESOLUTION: 72## Slice a portion of the image and plot a RGB image of that slice.
da_mosaic_subset = da_mosaic.sel(x=slice(-10641000, -10638000), y=slice(3474000, 3471000))
da_mosaic_subset.hvplot.rgb(x='x', y='y', data_aspect=1)
Question for you#
Can you find where in Houston is affected by Hurricane Harvey?
Goal 2 procedure#
In the latter part of the tutorial, we will show how to use PySTAC_Client to perform a simple query over a STAC catalog. Here we will use the Sentinel-2 Cloud-Optimized GeoTIFFs STAC catalog, hosted on AWS by Element 84. The image we used in the previous tutorial also comes from this STAC dataset, but this time we want to get more.
Note
Not all STAC catalogs can be queried. This is an optional feature for the data provider to determine whether to adopt. Thankfully, the popular Landsat and Sentinel catalogs on the AWS open data repository all support STAC querying.
PySTAC client#
PySTAC client provides basic interface for working with STAC catalogs. Intake-STAC partially builds on top of it.
Workflow#
Let’s import the necessary modules first:
import xarray as xr
import pandas as pd
import pystac_client
Now we pass the STAC catalog URL to PySTAC and do the data query:
# If you are curious about the content of this STAC catalog, don't forget the STAC browser!
catalog_url = "https://earth-search.aws.element84.com/v1"
catalog = pystac_client.Client.open(catalog_url)
results = catalog.search(
collections=["sentinel-2-l2a"], # Search within the sentinel-2-l2a collection
bbox = [121.21, 25.06, 121.26, 25.09], # Bounding box; [lon-left, lat-bottom, lon-right, lat-top]. Roughly focused at the Taoyuan intl airport.
datetime="2021-01-01/2023-05-22") # Search within this time span.
items = results.get_all_items()
print(len(items)) # This shows how many items are available.
169
Next, we pass items to Intake for the following analysis.
items_intake = intake.open_stac_item_collection(items)
# list(items_intake) # Try this!
# items_intake['S2B_51RUH_20230311_0_L2A'] # Try this!
# items_intake['S2B_51RUH_20230311_0_L2A'].metadata # Try this!
This time, each item has many images at our hands. Take S2B_51RUH_20230311_0_L2A (the image we used in the previous tutorial) as an example:
item = items_intake['S2B_51RUH_20230311_0_L2A']
for key in item.keys():
print(key)
aot
blue
coastal
granule_metadata
green
nir
nir08
nir09
red
rededge1
rededge2
rededge3
scl
swir16
swir22
thumbnail
tileinfo_metadata
visual
wvp
aot-jp2
blue-jp2
coastal-jp2
green-jp2
nir-jp2
nir08-jp2
nir09-jp2
red-jp2
rededge1-jp2
rededge2-jp2
rededge3-jp2
scl-jp2
swir16-jp2
swir22-jp2
visual-jp2
wvp-jp2
Each entry listed above is a scene you can access in this asset. For example, green means the image observed at the green wavelength, and nir means the image observed at the near infrared. Here we will retrieve visual, the 3-band true-color combination, from every image in the query result.
# Initialize an empty collection
data_list = []
timestamp_list = []
# Loop over each queried image
for scene in items_intake.values():
timestamp = scene.metadata['s2:generation_time'] # get time stame of each image
da = scene['visual'].to_dask() # lazily load the data as Dask array
da_subset = da.sel(x=slice(320000, 325000), y=slice(2777000, 2772000)) # Slice the image near the airport
# Now we apply a simple filter to mask our images with cloud cover > 10%.
if np.sum(da_subset.values[0, :, :] < 255) / 250000 > 0.9: # 250000 stands for the total number of the pixels in the sliced data. 255 is a saturated band color.
timestamp_list.append(timestamp)
data_list.append(da_subset)
Now we can merge all the data into a giant 4-D data cube!
datacube = xr.concat(data_list, pd.Index(timestamp_list, name="t"))
datacube
<xarray.DataArray (t: 38, band: 3, y: 500, x: 500)>
dask.array<concatenate, shape=(38, 3, 500, 500), dtype=uint8, chunksize=(1, 3, 500, 500), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1 2 3
* y (y) float64 2.777e+06 2.777e+06 2.777e+06 ... 2.772e+06 2.772e+06
* x (x) float64 3.2e+05 3.2e+05 3.2e+05 ... 3.25e+05 3.25e+05 3.25e+05
* t (t) object '2023-03-16T06:55:01.000000Z' ... '2021-01-15T05:16:3...
Attributes:
transform: (10.0, 0.0, 300000.0, 0.0, -10.0, 2800020.0)
crs: +init=epsg:32651
res: (10.0, 10.0)
is_tiled: 1
nodatavals: (0.0, 0.0, 0.0)
scales: (1.0, 1.0, 1.0)
offsets: (0.0, 0.0, 0.0)
AREA_OR_POINT: Area
OVR_RESAMPLING_ALG: AVERAGEFinally, let’s display the data using Hvplot’s animation widget. Have fun exploring the data using the display buttons!
datacube.hvplot.rgb(
groupby="t", # adds a widget that switches the view along the t axis.
data_aspect=1,
widget_type="scrubber",
widget_location="bottom",
)
Question for you#
Can you identify any changes of landscape around the Taoyuan international airport between 2021 and 2023?